[Reorgnize] Reorganize intra-node examples add add tests#52
[Reorgnize] Reorganize intra-node examples add add tests#52Rachmanino wants to merge 3 commits intomainfrom
Conversation
|
👋 Hi! Thank you for contributing to the TileLang project. Please remember to run We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀 |
📝 WalkthroughWalkthroughThe PR removes a distributed all-gather GEMM example file and introduces a new test suite for intranode distributed examples (allgather_gemm_overlapped, gemm_rs_overlapped, and sp_ag_attention_intra_node) that spawn multi-process CUDA execution on compute capability 9.0 hardware. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Tip Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@examples/distributed/intranode/test_intranode.py`:
- Around line 13-14: The test passes args=(2, None) which sends None into the
main functions and causes AttributeError when they access attributes; update the
test (test_example_allgather_gemm_overlapped) to pass a minimal valid args
object (e.g., an argparse.Namespace or simple object) containing the attributes
expected by example_allgather_gemm_overlapped.main,
example_gemm_rs_overlapped.main and example_sp_ag_attention_intra_node.main (at
least persistent for the first two, and batch_size, q_head, etc. for the
attention example), or alternately add defensive checks inside those main
functions to handle args is None before accessing attributes — pick one approach
and implement it consistently for the three mains named above.
🧹 Nitpick comments (1)
examples/distributed/intranode/test_intranode.py (1)
10-28: Considerpytest.mark.parametrizeto reduce boilerplate.All three tests share the same decorator stack and spawn logic, differing only in the imported module. A parametrized test would reduce duplication:
♻️ Optional refactor
+@tilelang.testing.requires_distributed +@tilelang.testing.requires_cuda +@tilelang.testing.requires_cuda_compute_version_eq(9, 0) +@pytest.mark.parametrize("module", [ + example_allgather_gemm_overlapped, + example_gemm_rs_overlapped, + example_sp_ag_attention_intra_node, +]) +def test_intranode_example(module): + torch.multiprocessing.spawn(module.main, args=(2, None), nprocs=2)
| def test_example_allgather_gemm_overlapped(): | ||
| torch.multiprocessing.spawn(example_allgather_gemm_overlapped.main, args=(2, None), nprocs=2) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# First, find the location of the example modules
find . -name "example_allgather_gemm_overlapped.py" -o -name "example_gemm_rs_overlapped.py" -o -name "example_sp_ag_attention_intra_node.py" | head -20Repository: tile-ai/tilescale
Length of output: 264
🏁 Script executed:
#!/bin/bash
# Extract main function definitions with context
for f in ./examples/distributed/intranode/example_allgather_gemm_overlapped.py ./examples/distributed/intranode/example_gemm_rs_overlapped.py ./examples/distributed/intranode/example_sp_ag_attention_intra_node.py; do
echo "=== $(basename $f) ==="
sed -n '/^def main/,/^def \|^class \|^$/p' "$f" | head -40
doneRepository: tile-ai/tilescale
Length of output: 931
🏁 Script executed:
#!/bin/bash
# Get the full main function for each module (up to 60 lines)
for f in ./examples/distributed/intranode/example_allgather_gemm_overlapped.py ./examples/distributed/intranode/example_gemm_rs_overlapped.py ./examples/distributed/intranode/example_sp_ag_attention_intra_node.py; do
echo "=== $(basename $f) ==="
sed -n '/^def main/,/^def \|^class \|^$/p' "$f" | head -60
echo ""
doneRepository: tile-ai/tilescale
Length of output: 934
🏁 Script executed:
#!/bin/bash
# Check the test file to confirm args pattern
echo "=== test_intranode.py ==="
cat -n examples/distributed/intranode/test_intranode.py
echo ""
echo "=== Checking for any args usage in example_sp_ag_attention_intra_node.py ==="
grep -n "args\." examples/distributed/intranode/example_sp_ag_attention_intra_node.py || echo "No args. accesses found"Repository: tile-ai/tilescale
Length of output: 1792
All three main functions will crash with AttributeError when args=None is passed.
The tests call torch.multiprocessing.spawn(..., args=(2, None), nprocs=2), which passes None as the third argument to each main function. However:
example_allgather_gemm_overlapped.mainandexample_gemm_rs_overlapped.mainunconditionally accessargs.persistenton line 5.example_sp_ag_attention_intra_node.mainunconditionally accessesargs.batch_size,args.q_head, and multiple other attributes (lines 293-304, 409).
All three will fail with AttributeError: 'NoneType' object has no attribute .... Either pass valid args or add None checks to all attribute accesses in these functions.
🤖 Prompt for AI Agents
In `@examples/distributed/intranode/test_intranode.py` around lines 13 - 14, The
test passes args=(2, None) which sends None into the main functions and causes
AttributeError when they access attributes; update the test
(test_example_allgather_gemm_overlapped) to pass a minimal valid args object
(e.g., an argparse.Namespace or simple object) containing the attributes
expected by example_allgather_gemm_overlapped.main,
example_gemm_rs_overlapped.main and example_sp_ag_attention_intra_node.main (at
least persistent for the first two, and batch_size, q_head, etc. for the
attention example), or alternately add defensive checks inside those main
functions to handle args is None before accessing attributes — pick one approach
and implement it consistently for the three mains named above.
Intranode examples weren't taken into consideration in CI previously. And i think we need to sort examples into intranode, ipc-based and others.
rdc-related issuesSummary by CodeRabbit
Tests
Chores